Rodney Dyer, PhD
Rarely do we keep all but the most simple of data in a single table.
Consider the two tables to the right.
Common column labeled Key
Other data columns (only 1 in each for brevity)
A primary key is a column in a table that uniquely identifies a single row. There MUST be a unique identifier to be a Primary Key.
A foreign key is one that references a primary key in another table.
Key Grade
1 A 117
2 B 97
3 A 86
4 D 102
5 B 93
Here the Key column is referencing a unique row of data in the people data table.
We can combine the data in these tables in several different ways based upon what we are looking for. When thinking of joins, we must think about the how we want to select the overlapping sets of keys in both data.frames.
Full Join (aka outer join)
Left Join
Right Join
Inner Join
Example data table structure
The outer join has all the data from both left & right tables. All keys are present in the result.
All homework and people data.
The left join is one where the result has all the keys from the left but only those in the right one that are in the left.
The Right join results in all the keys from the right data table and the matching ones from the left.
Inner joins result in the intersection of keys.
There are times when the results you are looking for should be based upon items in another data.frame object but should not contain information from that second table.
We can also use joins to filter values within one data.frame. Here the semi_join() keeps everything in the left data that has a key in the right one, but importantly it does not import the right data columns into the result.
The opposite of the semi_join() is the anti_join() which drops everything in the left table that has a key in the right one, leaving only the ones that are unique.
data.frame object OBJECTID Name Ordinance OrdinanceP
Min. : 5.0 Length:91 Length:91 Length:91
1st Qu.: 270.5 Class :character Class :character Class :character
Median : 377.0 Mode :character Mode :character Mode :character
Mean : 498.3
3rd Qu.: 505.0
Max. :1872.0
Conditiona AdoptionDa Comment CreatedBy
Length:91 Min. :2000-01-01 Length:91 Length:91
Class :character 1st Qu.:2000-01-01 Class :character Class :character
Mode :character Median :2001-10-08 Mode :character Mode :character
Mean :2005-06-16
3rd Qu.:2008-02-18
Max. :2020-02-24
CreatedDat EditBy EditDate
Min. :2020-08-24 Length:91 Min. :2020-08-24
1st Qu.:2020-08-24 Class :character 1st Qu.:2020-08-24
Median :2020-08-24 Mode :character Median :2020-08-24
Mean :2020-08-24 Mean :2020-08-24
3rd Qu.:2020-08-24 3rd Qu.:2020-08-24
Max. :2020-08-24 Max. :2020-08-24
GlobalID Shape__Are Shape__Len geometry
Length:91 Min. : 4466 Min. : 371 MULTIPOLYGON : 4
Class :character 1st Qu.: 84963 1st Qu.: 1388 POLYGON :87
Mode :character Median : 270545 Median : 2823 epsg:2284 : 0
Mean : 2697532 Mean : 6941 +proj=lcc ...: 0
3rd Qu.: 1653270 3rd Qu.: 8281
Max. :65772905 Max. :69888
[1] "OBJECTID" "Name" "Ordinance" "OrdinanceP" "Conditiona"
[6] "AdoptionDa" "Comment" "CreatedBy" "CreatedDat" "EditBy"
[11] "EditDate" "GlobalID" "Shape__Are" "Shape__Len" "geometry"
Simple feature collection with 1 feature and 14 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 11781130 ymin: 3727746 xmax: 11781350 ymax: 3728014
Projected CRS: NAD83 / Virginia South (ftUS)
OBJECTID Name Ordinance OrdinanceP Conditiona AdoptionDa Comment
5 5 B-1 <NA> <NA> No 2000-01-01 <NA>
CreatedBy CreatedDat EditBy EditDate
5 richard.morton_cor 2020-08-24 richard.morton_cor 2020-08-24
GlobalID Shape__Are Shape__Len
5 f1305477-4e71-463c-a202-332971d8c5e1 33261.3 762.5714
geometry
5 POLYGON ((11781140 3727834,...
FID Carriagewa AssetID StreetType
Min. :29123 Min. : 16873 Length:2120 Length:2120
1st Qu.:33102 1st Qu.: 50429 Class :character Class :character
Median :37238 Median : 72438 Mode :character Mode :character
Mean :38715 Mean : 63099
3rd Qu.:43488 3rd Qu.: 74663
Max. :57927 Max. :139427
Functional FIPS LeftFromAd LeftToAddr
Length:2120 Length:2120 Length:2120 Length:2120
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
RightFromA RightToAdd PrefixDire ProperName
Length:2120 Length:2120 Length:2120 Length:2120
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
SuffixType SuffixDire FullName RouteName
Length:2120 Length:2120 Length:2120 Length:2120
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
OneWay PostedSpee CADRouteSp CreatedBy
Length:2120 Min. : 0.0 Min. : 5.00 Length:2120
Class :character 1st Qu.:25.0 1st Qu.:25.00 Class :character
Mode :character Median :25.0 Median :25.00 Mode :character
Mean :25.9 Mean :21.53
3rd Qu.:25.0 3rd Qu.:25.00
Max. :55.0 Max. :85.00
NA's :465
CreatedDat EditBy EditDate
Min. :2020-05-20 Length:2120 Min. :2020-05-20
1st Qu.:2020-05-20 Class :character 1st Qu.:2020-05-20
Median :2020-05-20 Mode :character Median :2020-05-20
Mean :2020-05-20 Mean :2020-05-20
3rd Qu.:2020-05-20 3rd Qu.:2020-05-20
Max. :2020-05-20 Max. :2020-05-20
GlobalID SHAPE_Leng geometry
Length:2120 Min. : 13.64 LINESTRING :2119
Class :character 1st Qu.: 135.81 MULTILINESTRING: 1
Mode :character Median : 189.10 epsg:2284 : 0
Mean : 262.39 +proj=lcc ... : 0
3rd Qu.: 373.35
Max. :7872.64
fanRoads %>%
select( FullName, OneWay, StreetType,
SpeedLimit = PostedSpee, Length = SHAPE_Leng,
geometry) %>%
mutate( OneWay = factor( OneWay ),
StreetType = factor( StreetType) ) -> fanRoads
summary( fanRoads ) FullName OneWay StreetType SpeedLimit
Length:2120 FT : 329 Alley : 480 Min. : 0.0
Class :character TF : 352 Artery : 380 1st Qu.:25.0
Mode :character NA's:1439 Highway : 18 Median :25.0
Private : 8 Mean :25.9
Ramp : 11 3rd Qu.:25.0
Secondary:1223 Max. :55.0
NA's :465
Length geometry
Min. : 13.64 LINESTRING :2119
1st Qu.: 135.81 MULTILINESTRING: 1
Median : 189.10 epsg:2284 : 0
Mean : 262.39 +proj=lcc ... : 0
3rd Qu.: 373.35
Max. :7872.64
st_intersects()fanRoads %>%
filter( st_intersects( fanRoads, target,
sparse = FALSE ) == TRUE ) %>%
st_drop_geometry() %>%
select( `Street Name` = FullName ) %>%
arrange( `Street Name` ) %>%
unique() Street Name
1 Allison St
8 Birch St
12 Boyd St
17 Floyd Ave
34 Grove Ave
49 Hanover Ave
62 Kensington Ave
65 Lombardy Pl
68 Madumbie Lane
69 Monument Ave
96 N Allen Ave
119 N Arthur Ashe Blvd
121 N Brunswick St
122 N Davis Ave
152 N Granby St
161 N Harrison St
169 N Harvie St
175 N Lombardy St
186 N Meadow St
202 N Morris St
208 N Mulberry St
213 N Plum St
220 N Robinson St
237 N Rowland St
252 N Shields Ave
269 N Stafford Ave
283 N Vine St
294 Park Ave
317 Ryland St
318 S Plum St
319 S Stafford Ave
320 Strawberry St
342 Stuart Ave
350 Stuart Cir
363 Trouvaille Al
366 W Franklin St
374 W Main St
380 West Ave
384 <NA>
Four data.frames are contained within the library representing all the flights from NYC airports in 2013 (you may need to install this package).
airlines
airports
planes
weather
flights
# A tibble: 6 × 8
faa name lat lon alt tz dst tzone
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 04G Lansdowne Airport 41.1 -80.6 1044 -5 A America/Ne…
2 06A Moton Field Municipal Airport 32.5 -85.7 264 -6 A America/Ch…
3 06C Schaumburg Regional 42.0 -88.1 801 -6 A America/Ch…
4 06N Randall Airport 41.4 -74.4 523 -5 A America/Ne…
5 09J Jekyll Island Airport 31.1 -81.4 11 -5 A America/Ne…
6 0A9 Elizabethton Municipal Airport 36.4 -82.2 1593 -5 A America/Ne…
# A tibble: 6 × 9
tailnum year type manuf…¹ model engines seats speed engine
<chr> <int> <chr> <chr> <chr> <int> <int> <int> <chr>
1 N10156 2004 Fixed wing multi engine EMBRAER EMB-… 2 55 NA Turbo…
2 N102UW 1998 Fixed wing multi engine AIRBUS… A320… 2 182 NA Turbo…
3 N103US 1999 Fixed wing multi engine AIRBUS… A320… 2 182 NA Turbo…
4 N104UW 1999 Fixed wing multi engine AIRBUS… A320… 2 182 NA Turbo…
5 N10575 2002 Fixed wing multi engine EMBRAER EMB-… 2 55 NA Turbo…
6 N105UW 1999 Fixed wing multi engine AIRBUS… A320… 2 182 NA Turbo…
# … with abbreviated variable name ¹manufacturer
# A tibble: 6 × 15
origin year month day hour temp dewp humid wind_dir wind_speed wind_gust
<chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 EWR 2013 1 1 1 39.0 26.1 59.4 270 10.4 NA
2 EWR 2013 1 1 2 39.0 27.0 61.6 250 8.06 NA
3 EWR 2013 1 1 3 39.0 28.0 64.4 240 11.5 NA
4 EWR 2013 1 1 4 39.9 28.0 62.2 250 12.7 NA
5 EWR 2013 1 1 5 39.0 28.0 64.4 260 12.7 NA
6 EWR 2013 1 1 6 37.9 28.0 67.2 240 11.5 NA
# … with 4 more variables: precip <dbl>, pressure <dbl>, visib <dbl>,
# time_hour <dttm>
# ℹ Use `colnames()` to see all variable names
This is the main data.frame.